Fix #938: Call win32 APIs directly#942
Conversation
|
/ok to test |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test 7828876 |
This comment has been minimized.
This comment has been minimized.
rwgk
left a comment
There was a problem hiding this comment.
Wonderful!
It'd be ideal to also reduce the code duplication (get_cuda_version(), cdef extern from "windows.h":) in a set of follow-on PRs. I believe it'll be pretty straightforward after this PR and the associated codegen PRs are merged.
The files that are generated by cybind are including the externs in a template, so they at least aren't copy-pasted. That is, all of the templates for each library have a It might have been nicer to |
|
/ok to test 8c7ea2e |
|
Wow CI / Test linux-64 / py3.10, 13.0.0, wheels, GPU l4 (push) Failing after 4m I haven't seen that flake for a while. Just rerun and ignore. The timing being off by a small margin certainly isn't due to a problem in cuda-bindings. |
rwgk
left a comment
There was a problem hiding this comment.
Tests passed, except for that one flake. I think a rerun of that test will resolve the flake.
|
Let's |
leofang
left a comment
There was a problem hiding this comment.
FYI, I noticed the cuFile module is not changed?
Ah, I think that's just an oversight on my part. I will regenerate that as well. |
|
/ok to test e0e868a |
|
/ok to test 7a46173 |
|
/ok to test 488bdc2 |
|
/ok to test 673974c |
@mdboom, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test dcce4f5 |
|
@kkraus14: I merged main into here for one final test, but otherwise no changes since your last "approved" review. |
|
Description
Instead of using
pywin32, just calls the win32 APIs directly using Cythonextern.closes #938
This has a measurable impact on import time of about 9% (
import cuda.bindings.driverin a fresh interpreter), mainly by not spending time importingwin32api:It also improves "time to first call" by about 10% (since the first call resolves all of the dynamic function pointers and makes many win32 API calls):
Checklist